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ABSTRACT 

Given  i.i.d.  observations  Xj,X2,X3, ...,  Xjj  drawn  from  a  mixture  of  normal  terms  one  is 
often  interested  in  determining  the  number  of  terms  in  the  mixture  and  their  defining 
parameters.  Although  the  problem  of  determining  the  number  of  terms  is  intractable  under 
the  most  general  assumptions  there  is  hope  of  elucidating  the  mixture  structure  given 
appropriate  caveats  on  the  underlying  mixture.  This  paper  examines  a  new  approach  to 
this  problem  based  on  the  use  of  Akaike  Information  Criterion  (AIC)  based  pruning  of 
data  driven  mixture  models  which  are  obtained  from  resampled  data  sets.  Results  of  the 
application  of  this  procedure  to  artificially  generated  and  real  world  data  sets  are  provided. 
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1.  Introduction 


Given  X  =  {I'l,  -^2’  where  each  is  d  dimensional  and  i.i.d.  according  to  an 

unknown  density  /q  ih  one  is  often  interested  in  estimating  /g  (1)  .  This  problem  occurs 
in  such  areas  as  exploratory  data  analysis,  classification,  and  regression.  There  are  a  vari¬ 
ety  of  approaches  to  the  multivariate  density  estimation  problem  (Scott,  1992). 


An  often  used  parametric  approach  is  that  of  finite  mixture  models  (Everitt  and  Hand, 
1981)  in  combination  with  the  expectation  maximization  (EM)  method  of  Dempster, 

Laird,  and  Rubin  (1977).  One  difficulty  with  this  tactic  is  that  one  needs  some  idea  as  to 
the  appropriate  number  of  terms  in  the  mixture  model  as  well  as  the  approximate  parame¬ 
ter  values.  Given  this  information  the  EM  algorithm  is  guaranteed  to  converge  to  at  least  a 
local  maxima  in  the  likelihood  surface. 

Some  of  the  previous  nonparametric  approaches  include  histograms  (Sturges,  1926), 
frequency  polygons  (Scott,  1985a),  adaptive  histograms  (Wegman,  1970),  average  shifted 
histograms  (Scott,  1985b),  and  kernel  estimators  (Silverman,  1986).  These  approaches  are 
beneficial  in  that  they  possess  nice  asymptotic  consistency  properties,  robustness  with 
regard  to  nonnormality,  and  fewer  parameters  to  estimate  which  implies  better  estimates  in 
the  finite  sample  regime.  They  are  at  a  disadvantage  as  compared  to  the  mixture  model 
approach  when  it  is  suspected  that  the  unknown  true  density  is  a  mixture  of  a  number  of 
terms  and  one  would  like  to  estimate  the  posteriori  probability  of  underlying  term  mem¬ 
bership  for  an  unlabeled  observation. 

This  type  of  problem  exists  in  the  areas  of  medical  diagnosis  and  image  processing.  In 
medical  diagnosis  the  term  membership  may  play  an  important  role  in  identification  of  the 


underlying  mechanism  of  disease  or  the  identification  of  appropriate  tissue  type  (Carmen 
and  Merickel,  1990).  In  the  general  problem  of  image  analysis  the  term  membership  may 
pertain  to  region  type. 

A  recently  developed  density  estimation  technique  that  circumvents  some  of  the  prob¬ 
lems  of  the  above  techniques  is  the  adaptive  mixtures  procedure  of  Priebe  and  Marchette 
(1993).  This  procedure  is  a  blend  of  the  finite  mixtures  and  kernel  estimator  approaches.  It 
is  essentially  a  mixtures  type  approach  that  allows  for  the  creation  of  new  terms  in  a  data 
driven  manner.  We  have  successfully  applied  this  technique  in  combination  with  fractal- 
based  features  to  the  detection  of  man-made  objects  in  land  (Solka,  Priebe,  and  Rogers, 
1992)  and  aerial  (Priebe,  Solka,  and  Rogers,  1993)  images,  the  general  problem  of  texture 
classification  (Solka,  Priebe,  and  Rogers,  1993),  and  the  measurement  of  breast  parenchy¬ 
mal  tissue  density  (Priebe  et  al.,  1994).  The  adaptive  mixtures  estimator  is  asymptotically 
consistent  like  the  kernel  estimator,  but  it  has  the  added  benefit  of  creating  additional 
terms  at  a  rate  which  is  considerably  less  then  the  rate  n  creation  associated  with  the  kernel 
estimator. 

One  drawback  to  the  adaptive  mixtures  estimator  is  that  while  there  is  asymptotic  Lj 
convergence  for  the  procedure,  this  convergence  is  achieved  through  the  creation  of  an 
asymptotically  infinite  number  of  terms.  Thus  the  procedure  will  result  in  an  overly  com¬ 
plex  model  given  enough  data. 

In  this  work  we  are  interested  in  a  model  whose  complexity  more  closely  matches  that 
of  the  unknown  distribution.  The  approach  is  to  use  the  adaptive  mixtures  procedure  as  a 
starting  point  to  generate  a  mixture  model  with  (potentially  many)  extra  degrees  of  free¬ 
dom  or  parameters  and  to  prune  this  model  to  a  much  smaller  mixture  model.  This  pruning 


of  terms  which  is  based  on  the  use  of  the  Akaike  Information  Criterion  (AIC)  is  performed 
to  obtain  a  model  that  not  only  matching  the  underlying  distribution  in  a  functional  sense 
but  also  with  regard  to  model  complexity.  Subsequent  sections  will  detail  how  well  that 
our  pruning  based  procedure  meets  these  goals. 

The  AIC  was  originally  developed  as  a  tool  to  choose  between  two  statistical  estima¬ 
tors  of  differing  complexities  (Akaike,  1972).  The  AIC  is  written  as  a  function  of  the  like¬ 
lihood  L  and  number  of  free  parameters  M  in  a  model  as  follows 

A/C(/)  =  -  2ln{L)  +2M. 

In  Akaike ’s  original  paper  the  AIC  was  applied  to  time  series  analysis,  but  subsequent 
work  has  applied  the  technique  to  ISODATA  based  (Carman  and  Merickel,  1990)  and  gen¬ 
eral  clustering  (Bozdogan  and  Sclove,  1984),  and  finite  mixture  analysis  (Liang  Jaszczak, 
and  Coleman,  1992). 

We  have  developed  a  new  approach  to  finite  mixture  determination  which  employs 
AIC  based  pruning  of  AMDE  estimates.  This  approach  differs  from  the  work  of  Liang  in 
two  ways.  Liang  chooses  to  make  an  initial  guess  as  to  the  appropriate  complexity  of  the 
data’s  finite  mixture  model  and  then  adjusts  the  number  of  terms  in  the  model  up  and 
down  by  adding  and  removing  a  term  until  no  further  improvement  is  possible.  Our 
approach  begins  with  an  over  determined  data  driven  model  that  is  produced  by  the 
AMDE  procedure  and  then  uses  AIC  in  combination  with  the  expectation  maximization 
technique  to  prune  superfluous  terms  from  the  model.  So  whereas  Liang’s  approach  adds 
and  subtracts  terms  to  the  model  our  approach  just  removes  terms.  The  second  difference 
is  that  where  Liang’s  approach  is  to  produce  a  single  best  solution  to  the  finite  mixtures 
question  our  approach  produces  a  distribution  of  model  complexities  from  which  esti¬ 
mates  of  the  appropriate  model  complexity  can  be  made. 


Section  2  develops  the  methodology  for  term  pruning  in  the  case  of  finite  mixtures 
models  obtained  from  finite  sample  application  of  the  adaptive  mixtures  procedure.  Sec¬ 
tion  3  presents  results  indicating  that  we  can  improve  upon  an  overdetermined  mixture 
model  and  in  some  cases  determine  the  true  model  complexity.  Section  4  concludes  with  a 
discussion  of  the  relevance  of  these  results. 

2.  Approach 

Our  approach  combines  elements  of  nonparametric  density  estimation,  parametric 
density  estimation,  and  information  based  pruning.  The  nonparametric  AMDE  is  used  as 
the  starting  point  of  our  procedure.  We  begin  our  discussions  with  an  overview  of  AMDE. 
2.1  Adaptive  mixtures  density  estimation 

Given  an  unknown  distribution  /q  (1)  we  seek  to  model  the  distribution  using  /(.t) 
defined  by 

g 

fih^)  =  (1) 

/=  1 

where  K  is  some  fixed  density  parameterized  by  6,-,  and  ^  ftj,  Oj,  7:2,  §2,  -..■.itg,  6^  j. 
The  7t.‘s  are  referred  to  as  the  mixing  proportions.  (We  can  assume  for  much  of  what  fol¬ 
lows  that  K  is  taken  to  be  the  normal  distribution,  in  which  case  §,•  becomes  {  fi-,  Z,}  .)  In 
the  simplest  case  the  mixture  is  assumed  to  have  a  single  term  and  the  parameters  that 
need  to  be  estimated  are  the  mean  and  covariance  of  the  distribution. 

The  basic  stochastic  approach  to  parameter  estimation  is  to  recursively  update  the  esti¬ 
mate  'E  of  the  true  parameters  'Fg  based  on  the  latest  estimate  and  the  newest  data 


point  X  + 1  • 


(2) 


for  some  update  function  The  specific  form  of  the  update  equation  that  we  use  is  the 
one  suggested  by  Titterington  (1984).  If  we  let  ICT")  be  the  Fisher  information  then  the  ver¬ 
sion  of  the  recursive  update  formula  we  will  use  is 


%+!=%  +  («/(%)) 


(3) 


where  the  derivatives  represents  the  vector  of  partial  derivative  with  respect  to  the  terms 

of^. 

In  the  case  of  mixtures  of  multivariate  normals  we  may  write  the  recursive  update 
equations  as 

(^n  +  i;0n) 


.«•) 

^n  +  1 


(4) 


J  =  1 


.(/)  .(/)  If.(i)  .(O') 


(5) 


.  (0  "  +  M  jc  -p,^'M,and 

.(OV  J 

nn„ 


(6) 


xi'I,  =  ® 

«7t„ 

This  is  where  j  is  the  estimated  posteriori  probability  of  belonging  to  the  ith  term 
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of  the  mixture,  j  is  the  estimated  mixing  coefficient,  j  is  the  d  dimensional  esti- 
•  (0 

mated  mean,  and  I„  + 1  is  the  dxd  estimated  covariance  matrix  of  the  ith  term. 

The  adaptive  mixtures  density  estimation  (AMDE)  stochastic  approximation  approach 
is  to  recursively  update  'T,  the  estimate  of  the  true  parameters  'Tq’  while  at  the  same  time 
providing  the  capability  to  expand  the  extent  of  the  parameter  space  'F  if  dictated  by  the 
underlying  complexity  of  the  data.  We  note  that  in  the  AMDE  case  our  parameter  space 
'F is  given  by  'F  =  ftp  Gj,  7C2,  02.  -  j  •  The  procedure 

is  used  to  recursively  update  the  density  where  A  =  [1-P,  (x,  +  i;'F,)],and 
B  =  Pj(Xi+i  ;'F,)  .  Pt  represents  a  possibly  stochastic  create  decision  and  takes  on  values 
0  or  1.  U;  updates  the  current  parameters  using  equations  (4-7)  while  Q  adds  a  new  term 
to  the  model.  As  is  implicit  in  the  equation,  the  decision  to  add  a  new  term  is  a  function  of 
the  current  data  point,  our  current  estimation  of  the  parameters,  and  time.  The  time  depen¬ 
dence  is  important  in  those  cases  that  we  wish  to  anneal  the  probability  of  creation  as  a 
function  of  training  time.  The  models  produced  by  the  AMDE  procedure  are  good  func¬ 
tional  estimates,  but  are  typically  overdetermined  with  regard  to  the  number  of  terms. 

2.2  Approaches  to  AIC  based  pruning  of  AMDE  generated  mixture  models 

Previous  work  in  the  literature  has  examined  the  application  of  the  AIC  to  the  determi¬ 
nation  of  the  number  of  terms  in  a  finite  mixture  (Liang,  Jaszczak,  and  Coleman,  1992). 
The  AlC/n  estimates  -2  times  the  expected  value  of  the  log  likelihood  of  the  estimated 
model  (Akaike,  1972) 

=  -2£[J/„logg.  (9) 

AIC  is  defined  in  terms  of  likelihood,  L,  and  the  number  of  free  parameters  in  the  model. 
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M,  as 


(10) 


AlCif)  =  -  2ln(L)  +  2M  =  -2ln\fi^i)]  +2M. 

One  uses  the  AIC  to  choose  between  models  of  differing  complexities  by  selecting  the 
model  with  the  minimum  AIC.  This  choice  is  equivalent  to  maximizing  the  mean  likeli¬ 
hood  of  the  model. 

Using  this  idea  as  a  starting  point  we  have  developed  a  procedure  that  uses  a  single  or 
set  of  adaptive  mixtures  density  estimates  and  produces  a  pruned  model  with  a  lower  com¬ 
plexity.  This  procedure  uses  AIC  to  evaluate  the  appropriateness  of  lower  complexity 
models  that  have  been  subjected  to  the  iterative  EM  method.  In  the  iterative  EM  method 
the  update  equation  takes  the  form 

%  +  (11) 

where  <!>  is  the  update  function  and  X  is  the  set  of  observations.  In  the  case  of  mixtures  of 
multivariate  normals  we  may  write  the  iterative  update  equations  as 


^iJ  = 


(12) 


/  =  1 


”  T-. 
‘J 


y  ^  n 
y  =  1 


I 

A;  =  — .  and 

‘  nK- 


(13) 


(14) 
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(15) 


f-.  - 

^  nil: 

j=\  ‘ 


Z: 


This  is  where  t -  is  the  estimated  posteriori  probability  that  xj  belongs  to  term  i,  ii-j  is  the 
estimated  mixing  coefficient,  |i-  is  the  d  dimensional  estimated  mean  vector,  and  S,-  is  the 
dxd  estimated  covariance  matrix  for  the  ith  term. 


The  steps  in  our  pruning  procedure  are  as  follows. 


Step  1  -  Obtain  fg  an  initial  adaptive  mixtures  approximation  to  fo  containing  g  terms. 
Step  2  -  Compute  the  AIC  of  each  of  the  g-1  term  models  after  application  of  the  EM 
method  of  equations  (12-15)  to  each  of  the  models. 

Step  3  -  If  AIC  (fg-i)  <AIC  (fg)  for  one  of  the  g-1  term  models  then  the  pruning 
process  is  repeated  using  this  model. 

Step  4.  Repeat  this  process  of  pruning  and  expectation  maximization  until  no  further 
improvement  is  possible. 


It  is  important  to  point  out  that  at  each  pruning  step  the  remaining  terms  are  updated 
based  on  their  Mahalonobis  distance  to  the  pruned  term  prior  to  updating  with  the  EM 
method. 


Figure  1  illustrates  the  pruning  process.  The  log  likelihood  for  the  true  model,  the  orig¬ 
inal  ten  term  model,  and  the  pruned  and  subsequently  expectation  maximized  models  are 
plotted.  In  this  case  the  process  was  able  to  reduce  a  ten  term  model  of  the  mixture.5N(- 
2,1)  -t-.5N(2,l)  to  the  appropriate  two  term  model.  This  case  will  be  discussed  in  Section  3. 
[FIGURE  1  SHOULD  GO  ABOUT  HERE] 
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3.  Results 


This  pruning  procedure  was  tested  on  data  sets  drawn  from  two  different  bimodal  two 
term  distributions,  one  four  mode  four  term  distribution,  a  standard  unimodal  normal  dis¬ 
tribution,  and  the  Buffalo  snowfall  data  (Parzen,  1979),  see  Figure  2.  In  each  simulated 
data  case  10,000  points  were  drawn  from  each  distribution.  The  snowfall  data  consisted  of 
63  points. 

Twenty-five  bootstrap  resamples  were  extracted  from  each  of  the  data  sets  using  their 
empirical  distributions  (Efron  and  Tibshirani,  1993).  These  resamples  are  used  in  a  way 
that  is  slightly  different  from  the  standard  procedure.  In  standard  bootstrapping  one  uses 
the  resamples  to  estimate  the  standard  error  of  a  statistic  whose  standard  error  is  not  avail¬ 
able  in  closed  form.  Our  goal  in  bootstrapping  is  the  production  of  a  distribution  on  the 
number  of  terms  in  the  models  after  AIC  based  pruning.  This  distribution  can  then  be  used 
to  estimate  the  number  of  terms  in  the  true  model. 

An  up  to  ten  term  adaptive  mixtures  model  was  created  for  each  of  the  resampled  data 
sets.  Each  of  these  models  were  then  subjected  to  the  AIC  based  pruning  process.  This 
process  provides  a  model  complexity  distribution  based  on  the  data  set. 

[FIGURE  2  SHOULD  GO  ABOUT  HERE] 

In  Figures  3a  and  3b  we  present  adaptive  mixtures  solutions  for  two  of  the  resamplings 
of  the  data  set  drawn  form  a(x)=.5*N(-2,l)-i-.5*N(2,l).  We  have  included  dF  space  plots 
along  with  the  standard  functional  representation  of  the  distributions.  dF  space  plots  are  an 
effective  way  to  display  the  terms  in  a  mixture.  Each  term  7CiN(|J.i,ai^)  is  plotted  as  a  circle 
whose  radius  is  proportional  to  Tt;  and  whose  center  is  given  by  (lii.aj^).  Where  it  is  hard  to 


discern  the  distributional  structure  from  a  standard  function  plot  it  is  quite  easy  in  a  dF 
space  plot.  We  notice  that  the  terms  in  each  of  the  two  solutions  are  markedly  different. 
This  phenomena  falls  under  the  adage  that  there  is  “more  then  one  way  to  skin  a  cat”  when 
producing  a  functional  estimate.  We  also  notice  that  there  are  more  then  the  “theoretical” 
number  of  terms  needed.  Each  of  the  models  is  made  up  of  ten  terms.  The  occurrence  of  a 
matching  number  of  terms  in  each  model  is  the  result  of  our  initial  constraint  on  the  model 
complexity.  It  is  important  to  note  that  though  the  terms  are  different  in  each  solution  the 
location  and  number  of  modes  is  not  and  that  there  are  terms  that  are  superfluous  to  the 
minimal  representation  of  the  distribution. 

[FIGURE  3  SHOULD  GO  ABOUT  HERE.] 

Table  1  illustrates  the  results  of  the  pruning  process.  For  each  of  the  five  distributions 
we  have  listed  the  number  of  terms  in  the  final  pruned  models  for  each  of  the  twenty-five 
resamples.  In  case  a  the  procedure  converged  to  the  correct  solution  1 1  of  25  times,  7  of  25 
times  in  case  b,  17  of  25  times  in  case  c,  and  3  of  25  times  in  case  d.  The  procedure  con¬ 
verged  to  a  3  term  solution  4  of  the  25  times  in  the  case  of  the  snowfall  data.  The  appropri¬ 
ate  solution  for  the  case  of  the  snowfall  data  will  be  the  subject  of  later  discussions. 
[TABLE  1  SHOULD  GO  ABOUT  HERE.] 

We  may  estimate  the  model  complexity  through  the  use  of  statistical  measures  on  this 
distribution.  For  example  one  could  choose  the  minimal  order  statistic  as  the  measure  of 
the  number  of  terms  in  the  minimal  complexity  mixture  model  that  characterizes  the  data. 
This  choice  has  the  advantage  that  it  represents  the  lowest  complexity  model  obtainable 
from  the  procedure.  Alternatively  one  could  use  the  expected  value  of  the  distribution. 
This  choice  indicates  the  average  complexity  of  the  mixture  models  that  represent  the  data 
set. 
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Table  2  presents  the  average  Lj  error  between  the  true  mixture  model  and  the  pruned 
model  for  each  of  the  first  4  cases  for  each  of  the  obtained  model  complexities.  No  Lj 
results  were  pro\'ide  for  the  snowfall  data  since  the  true  underlying  model  is  unknown.  It 
is  encouraging  to  note  that  in  each  case  the  minimum  average  error  occurs  at  the  appropri¬ 
ate  level  of  model  complexity. 

[TABLE  2  SHOULD  GO  ABOUT  HERE.] 

The  number  of  modes  in  the  snowfall  data  has  been  the  topic  of  continued  debate 
throughout  the  history  of  density  estimation.  Arguments  have  been  made  in  favor  of  tri- 
modal  (Scott,  1992)  and  unimodal  structure  (Scott  1994).  In  Figure  4a  we  compare  the 
output  of  the  pruning  process  for  one  of  the  3  term  models  to  a  standard  kernel  estimator 
with  a  bandwidth  of  6.  The  bandwidth  of  6  was  chosen  as  an  appropriate  setting  to  illus¬ 
trate  the  trimodality  of  the  data  (Silverman,  1986).  The  3  term  model  has  been  expectation 
maximized  against  the  original  data  in  order  to  make  a  fair  comparison  between  the  two. 
We  note  that  the  two  models  are  very  similar  in  character  and  specifically  with  regard  to 
the  mode  placement.  In  fact  if  we  drop  the  bandwidth  down  to  4  we  can  obtain  a  solution 
that  is  even  closer  in  character  to  our  mixture  model,  see  Figure  4b. 

[FIGURE  4  SHOULD  GO  ABOUT  HERE] 

The  last  thing  left  to  be  discussed  is  the  output  of  the  pruning  procedure.  In  Figures  5 
a,  b,  c  and  6  a  and  b  we  present  an  expectation  maximized  adaptive  mixture  solution  along 
with  the  output  of  pruning  this  solution.  We  notice  that  the  number  of  terms  in  the  solution 
has  been  reduced  from  ten  to  the  appropriate  number  in  each  case.  We  also  notice  that  the 
terms  left  from  the  process  are  in  approximately  the  correct  location  and  have  about  the 
right  mixing  coefficients  and  variances. 

[FIGURES  5  AND  6  HERE] 


4.  Conclusions 


The  AMDE  procedure  provides  a  data  driven  method  for  obtaining  a  good  mixture 
model  density  estimate.  The  convergence  properties  of  the  procedure  tend  to  guarantee 
that  the  model  will  be  of  higher  complexity  than  the  true  density  if  the  later  is  a  finite  mix¬ 
ture.  The  exceptions  to  this  occur  when  the  sample  size  is  small  enough  that  too  few  terms 
are  created  by  the  AMDE.  The  AMDE  thus  provides  a  useful  mixture  model  estimate  of 
an  unknown  density. 

The  AIC  provides  a  convenient  tool  for  evaluating  appropriate  model  complexity. 

It  serves  as  a  good  “rule  of  thumb”  in  choosing  between  models.  Under  appropriate  condi¬ 
tions  it  has  the  capability  to  help  reveal  the  underlying  mixture  which  generates  as  data 
set. 


In  many  cases  we  have  reason  to  believe  that  the  unknown  density  is  a  mixture  model 
but  of  unknown  complexity.  Then  we  are  often  interested  in  the  structure  of  the  underlying 
mixture  model.  It  is  in  this  case  that  AIC  based  pruning  can  be  used  to  find  not  only  an 
“optimal”  model  but  also  a  distribution  of  pruned  models  which  provides  some  knowledge 
about  the  true  density. 

In  this  paper  we  have  presented  a  new  technique  to  help  determine  the  unknown  struc¬ 
ture  of  a  mixture  model.  This  technique  uses  a  set  of  adaptive  mixtures  solutions  that  have 
been  subject  to  AIC  based  pruning  to  help  determine  the  minimum  complexity  mixture 
model  that  best  characterizes  the  data.  The  goal  of  this  technique  is  the  production  of  a 
more  parsimonious  mixture  model  of  an  unknown  distribution. 


This  approach  embodies  the  spirit  in  which  the  AIC  should  be  used  in  that  one  is  com¬ 
paring  two  maximum  likelihood  solutions.  There  is  a  penalty  with  regard  to  computational 
complexity  that  occurs  in  the  production  of  expectation  maximized  models  at  each  step  of 
the  pruning  process.  However  the  pruning  procedure  is  highly  parallel  in  nature  and  we 
would  expect  substantial  speedups  on  a  MIMD  machine. 

Future  work  will  focus  on  extending  this  technique  to  multivariate  distributions,  on  a 
more  in-depth  analysis  of  the  theoretical  underpinnings  of  the  approach,  and  on  parallel¬ 
ization  of  the  procedure.  We  also  plan  to  pursue  better  term  creation  techniques  within  the 
AMDE  framework.  Finally  we  hope  to  produce  an  AMDE  like  estimator  that  uses  term 
creations  and  annihilations. 
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FIGURE  AND  TABLE  CAPTIONS 


Figure  1  -  Pruning  curves  for  the  reduction  of  a  10  term  model  to  a  2  term  model. 

Figure  2  Test  Cases 

(a)  -  a(x)=  5*N(-2,1)+.5*N(2,1), 

(b)  -  a(x)  =. 5*N(-1.25,1)+.5*N(1.25,1), 

(c)  -  a(x)  =  25*N(-6,1)+.25*N(-2,1)+.25*N(2,1)+.25*N(6,1), 

(d) -  a(x)  =  N(0,l), 

(e)  -  The  Buffalo  Snowfall  Data 

Figures  3  a  and  b  -  Adaptive  mixtures  estimates  for  two  of  the  resamplings  of  the  data  set 
drawn  from  case  a. 

Figures  4  a,  b  -  Comparison  of  the  pruned  3  term  model  which  has  been  expectation  max¬ 
imized  against  kernel  estimates  of  the  original  Buffalo  snowfall  data  with  bandwidths  of  6. 
and  4. 

Figures  5  a,  b,  and  c  -  Expectation  maximized  adaptive  mixtures  estimates  along  with  the 
output  of  the  pruning  process  for  the  first  three  cases. 

Figures  6  a  and  b  -  Expectation  maximized  adaptive  mixtures  estimates  along  with  the 
output  of  the  pruning  process  for  the  last  two  cases. 

Table  1:  Number  of  terms  for  each  case. 

Table  2:  Average  Lj  error  for  each  test  case  for  each  model  complexity. 
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